The WebIO Jupyter extension was not detected. See the WebIO Jupyter integration documentation for more information.
1 Setup
1.1 Imports
2 Dataset Description
2.1 Sample Composition
This analysis uses N=25 spider web images collected from spiders exposed to different agricultural chemicals:
| Group | N | Drug Type | Mechanism of Action |
|---|---|---|---|
| CONTROL | 5 | None | Baseline web structure |
| CIPERMETRINA | 5 | Insecticide (Pyrethroid) | Synthetic pyrethroid; disrupts sodium channels causing tremors and impaired motor control |
| ENDOSULFAN | 5 | Insecticide (Organochlorine) | GABA antagonist; causes seizures and neurological disruption (banned in many countries) |
| GLIFOSATO | 5 | Herbicide (Glyphosate) | Glycine analog; disputed neurotoxicity in arthropods |
| SPINOSAD | 5 | Insecticide (Organic) | Bacterial metabolite; nicotinic acetylcholine agonist causing paralysis |
The original dataset lacks several critical experimental details:
- Spider species identification
- Drug dosages and exposure protocols
- Spider age, sex, and size
- Environmental conditions (temperature, humidity, light)
- Time post-exposure when webs were photographed
This limits biological interpretation and generalizability of findings.
2.2 Biological Context: Why Study Spider Webs?
Spider web construction is a sensitive bioassay for neurotoxicity. Building a web requires:
- Precise motor control: Accurate silk placement and anchor point selection
- Spatial memory: Following geometric patterns and maintaining symmetry
- Proprioception: Body position awareness during construction
Drugs affecting the nervous system disrupt these processes, manifesting as structural changes visible in the web geometry.
2.2.1 Historical Precedent
Spider web pharmacology dates to 1948 (Witt et al.): drugs like caffeine, LSD, and marijuana produce characteristic web deformations. Modern applications include:
- Environmental toxicology monitoring
- Pesticide safety assessment
- Neurological drug screening
2.2.2 Expected Drug Effects
Based on mechanism of action:
- Pyrethroids (CIPERMETRINA): Sodium channel disruption → tremors → irregular silk placement
- Organochlorines (ENDOSULFAN): GABA antagonist → seizures → chaotic structures
- Glyphosate (GLIFOSATO): Herbicide with disputed neurotoxicity → unclear effect expected
- Spinosad: Nicotinic agonist → paralysis → incomplete or simplified webs
2.3 Why TDA for This Problem?
Traditional image analysis methods (edge detection, Fourier analysis, texture features) struggle with spider webs because:
- Geometric irregularity: Webs don’t follow rigid templates
- Scale variation: Cell sizes vary across the web
- Partial structures: Incomplete or torn webs
TDA Advantages:
- Topological invariance: Robust to rotation, scaling, and small perturbations
- Multi-scale analysis: Persistence diagrams capture features at all scales simultaneously
- Interpretable features: H0 = fragmentation, H1 = loop structure (cells/meshes)
- No template required: Data-driven rather than model-based
We hypothesize that drug-induced neurological impairment will manifest as:
- H1 features (closed loops) decreasing under drugs that impair motor coordination
- H0 features (fragmentation) increasing if drugs cause severe behavioral disruption
- Persistence entropy decreasing under drugs that produce irregular cell patterns
Topological features should provide more sensitive detection than simple metrics like web area or thread count.
3 Data Loading
3.1 Load Images
Groups: SubString{String}["CIPERMETRINA", "CONTROL", "ENDOSULFAN", "GLIFOSATO", "SPINOSAD"]
Samples per group:
CIPERMETRINA: 5
CONTROL: 5
ENDOSULFAN: 5
GLIFOSATO: 5
SPINOSAD: 5
3.2 Sample Size and Statistical Power
Dataset Size: N=25 total (5 samples per group)
This sample size has important implications for statistical inference:
3.2.1 Detection Limits
With N=5 per group, we can reliably detect only large effect sizes (Cohen’s |d| > 0.8):
- Small effects (|d| < 0.5): Very low power (~20-30%)
- Medium effects (|d| = 0.5-0.8): Moderate power (~40-60%)
- Large effects (|d| > 0.8): Adequate power (~70-80%)
3.2.2 Statistical Concerns
- Wide confidence intervals: Effect size estimates are imprecise
- High variance: Cross-validation results show substantial standard deviations
- No independent validation: All results use LOOCV on the same 25 samples
- Overfitting risk: Classification models may capture sample-specific noise
3.2.3 Study Interpretation
Given these limitations, this analysis should be interpreted as:
✓ Proof-of-concept demonstrating TDA methodology ✓ Exploratory analysis generating hypotheses ✓ Method validation showing feasibility
✗ NOT definitive biological conclusions ✗ NOT generalizable without replication ✗ NOT powered for detecting subtle effects
3.2.4 Recommendations for Future Work
- Minimal viable study: N ≥ 20 per group
- Well-powered study: N ≥ 30 per group
- Independent validation cohort: 70/30 train/test split or multi-site data
Results presented here represent upper bounds on performance (likely optimistic due to overfitting) and should guide future adequately-powered studies.
3.3 Point Cloud Sampling
We extract 1000 points from each web using the farthest point sample algorithm.
4 Persistence Diagrams
4.1 Compute Rips Filtration
4.2 Extract H0 and H1
Number of samples: 25
H0 diagram example (first non-empty):
Sample 1: 1000 features
H1 diagram example (first non-empty):
Sample 1: 71 features
4.3 Representative Web Examples
Before diving into feature extraction and statistics, let’s visualize representative spider webs from each treatment group along with their persistence diagrams. Each panel shows:
- Top left: Persistence diagram (birth-death plot showing H1 cycles)
- Top right: Web image intensity heatmap
- Bottom: Point cloud sample used for TDA computation
5 Feature Extraction
We extract several statistical summaries from each persistence diagram. Here’s what each feature measures and how it relates to web structure:
| Feature | What it measures | Web interpretation |
|---|---|---|
| n_features | Number of H1 cycles detected | Number of closed cells in the web |
| total_persistence | Sum of all cycle lifespans | Overall topological complexity |
| median_persistence | Median cycle lifespan | Typical cell “robustness” (robust to outliers) |
| max_persistence | Largest cycle lifespan | Most prominent hole or cell |
| entropy | Uniformity of cycle lifespans | High = regular cells; Low = irregular cells |
| median_birth | Median scale at which cycles appear | Typical cell size (robust to outliers) |
These features transform complex persistence diagrams into interpretable numbers that can be compared statistically across treatment groups.
5.1 Rich Statistics - H1 (Cycles)
| Row | Specie | n_features | total_persistence | median_persistence | max_persistence | entropy |
|---|---|---|---|---|---|---|
| SubStrin… | Int64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | CIPERMETRINA | 71 | 831.438 | 9.631 | 66.449 | 4.079 |
| 2 | CIPERMETRINA | 69 | 779.229 | 8.354 | 37.007 | 4.076 |
| 3 | CIPERMETRINA | 66 | 925.464 | 11.833 | 58.847 | 4.033 |
| 4 | CIPERMETRINA | 69 | 777.478 | 10.289 | 28.721 | 4.138 |
| 5 | CIPERMETRINA | 50 | 499.91 | 8.61 | 22.183 | 3.811 |
| 6 | CONTROL | 68 | 473.123 | 6.0 | 19.114 | 4.159 |
| 7 | CONTROL | 81 | 577.74 | 6.403 | 14.111 | 4.357 |
| 8 | CONTROL | 138 | 1086.26 | 6.606 | 38.022 | 4.836 |
| 9 | CONTROL | 108 | 764.702 | 6.43 | 16.632 | 4.643 |
| 10 | CONTROL | 93 | 626.066 | 6.275 | 12.905 | 4.507 |
| 11 | ENDOSULFAN | 87 | 648.155 | 6.298 | 31.075 | 4.379 |
| 12 | ENDOSULFAN | 68 | 543.676 | 6.945 | 22.366 | 4.145 |
| 13 | ENDOSULFAN | 80 | 812.353 | 6.842 | 80.652 | 4.111 |
| 14 | ENDOSULFAN | 59 | 610.397 | 6.896 | 58.875 | 3.785 |
| 15 | ENDOSULFAN | 61 | 616.653 | 8.022 | 55.703 | 3.941 |
| 16 | GLIFOSATO | 60 | 965.556 | 10.522 | 107.758 | 3.752 |
| 17 | GLIFOSATO | 43 | 695.187 | 10.05 | 60.592 | 3.474 |
| 18 | GLIFOSATO | 72 | 868.896 | 8.868 | 79.585 | 4.049 |
| 19 | GLIFOSATO | 54 | 695.542 | 9.893 | 68.599 | 3.765 |
| 20 | GLIFOSATO | 67 | 866.073 | 9.56 | 51.756 | 4.025 |
| 21 | SPINOSAD | 70 | 793.129 | 9.768 | 37.691 | 4.093 |
| 22 | SPINOSAD | 69 | 827.155 | 10.814 | 53.565 | 4.099 |
| 23 | SPINOSAD | 72 | 780.85 | 8.964 | 38.603 | 4.137 |
| 24 | SPINOSAD | 64 | 747.34 | 9.938 | 27.148 | 4.04 |
| 25 | SPINOSAD | 78 | 831.255 | 9.219 | 57.754 | 4.216 |
5.2 H0 (Connected Components) - Not Analyzed
All spider webs in this dataset remain structurally connected (single component), meaning H0 persistence provides minimal discriminatory information between treatment groups.
The absence of web fragmentation suggests:
- Spiders complete web construction despite drug exposure
- Drug effects manifest primarily as topological changes within connected structures (H1 features)
- Changes in loop/cell patterns (H1) rather than complete structural breakdown
Therefore, we focus our analysis on H1 (one-dimensional persistence) which captures the relevant differences in cell structure and regularity.
5.3 Feature Matrices
H1 features: [:n_features, :total_persistence, :median_persistence, :std_persistence, :max_persistence, :q25, :q50, :q75, :q90, :entropy, :median_birth, :birth_range]
Feature matrix size: (25, 12)
5.4 Vectorized Diagram Features
Vectorized features dimension: 362
6 Exploratory Visualization
6.1 Summary Statistics by Drug
Mean Statistics by Group:
CIPERMETRINA:
- Mean cycles (H1): 65.0
- Mean entropy: 4.027
- Mean max persistence: 42.642
CONTROL:
- Mean cycles (H1): 97.6
- Mean entropy: 4.5
- Mean max persistence: 20.157
ENDOSULFAN:
- Mean cycles (H1): 71.0
- Mean entropy: 4.072
- Mean max persistence: 49.734
GLIFOSATO:
- Mean cycles (H1): 59.2
- Mean entropy: 3.813
- Mean max persistence: 73.658
SPINOSAD:
- Mean cycles (H1): 70.6
- Mean entropy: 4.117
- Mean max persistence: 42.952
6.2 Betti Curves by Drug
6.3 Average Persistence Images
6.4 Within-Group Variability
Some drug groups show more heterogeneity in web structure than others. Below we show the most and least complex webs (by entropy) within each group to illustrate this variability. High entropy indicates regular, uniform cell sizes; low entropy indicates irregular cells.
6.5 Feature Distributions by Group
The boxplots below show how key TDA features are distributed across treatment groups. This helps visualize group differences before formal statistical testing.
7 Distance Analysis
The Wasserstein distance (also called Earth Mover’s Distance) measures how different two persistence diagrams are.
Intuition: Imagine each point in a persistence diagram as a pile of dirt. The Wasserstein distance is the minimum “work” needed to transform one diagram into another by moving dirt around.
Why use it for TDA?
- Specifically designed for comparing persistence diagrams
- Captures both the locations of topological features and how they should be matched
- Has metric properties, enabling use with standard machine learning methods (like KNN)
Notation: Wasserstein(p, q) — we use p=1 (sum of movements) and q=2 (Euclidean ground metric).
A small Wasserstein distance means two webs have similar topological structure; a large distance means their persistence diagrams differ substantially.
MDS converts a distance matrix into low-dimensional coordinates for visualization:
- Start with pairwise distances between all samples
- Find 2D or 3D coordinates that preserve these distances as well as possible
- Plot the coordinates — samples close together have similar features
How to interpret MDS plots:
- Clusters = groups of samples with similar topological features
- Separation between clusters = distinct TDA signatures between groups
- Overlap between groups = ambiguity; these groups are hard to distinguish topologically
MDS is purely for visualization — it doesn’t make statistical claims, but helps us see patterns before formal testing.
7.1 Wasserstein Distance Matrix
25×25 Matrix{Float64}:
0.0 125.421 149.505 149.996 … 161.391 207.911 158.323 263.38
125.421 0.0 153.59 135.848 157.259 215.999 133.712 267.336
149.505 153.59 0.0 182.613 151.695 251.085 160.943 336.517
149.996 135.848 182.613 0.0 105.588 144.571 101.393 181.717
266.832 230.788 308.53 212.951 244.25 213.088 184.152 254.225
299.746 276.566 391.579 242.585 … 306.842 278.482 255.82 263.543
315.641 293.247 406.906 282.731 338.736 337.27 299.099 289.382
600.395 578.767 652.597 637.794 641.536 694.576 651.218 747.219
416.68 389.453 509.701 398.214 456.635 462.956 411.355 407.379
385.303 360.164 482.388 347.347 412.4 399.277 363.692 338.669
309.979 283.324 393.786 278.028 … 320.495 319.393 302.347 303.891
244.195 225.081 334.331 179.349 241.556 202.346 194.103 214.974
159.134 194.802 257.867 252.589 249.588 264.4 267.983 318.443
246.778 232.008 299.076 274.348 253.482 255.08 254.396 379.716
173.598 196.275 247.833 213.122 197.099 210.929 185.743 274.036
282.592 300.674 272.063 344.175 … 282.728 327.157 303.107 433.23
397.027 351.363 382.757 400.73 370.031 361.843 377.011 451.677
128.08 172.289 184.227 192.1 154.853 187.414 197.367 299.317
325.133 307.064 329.953 287.327 271.733 231.725 283.73 356.261
141.482 133.883 96.7042 178.781 134.393 235.819 150.86 327.573
144.326 106.155 184.168 105.203 … 137.01 171.337 131.314 215.534
161.391 157.259 151.695 105.588 0.0 163.167 127.829 253.459
207.911 215.999 251.085 144.571 163.167 0.0 165.233 211.368
158.323 133.712 160.943 101.393 127.829 165.233 0.0 211.254
263.38 267.336 336.517 181.717 253.459 211.368 211.254 0.0
7.2 Distance Metric Comparison: Wasserstein vs Bottleneck
Different distance metrics capture different aspects of topological dissimilarity. We compare two fundamental persistence diagram distances:
| Property | Wasserstein W₁ | Bottleneck d∞ |
|---|---|---|
| Definition | Optimal matching cost (total transport) | Worst-case matching cost (max single distance) |
| Formula | Sum of all point distances in optimal matching | Maximum single point distance in optimal matching |
| Sensitivity | Sensitive to all points (global measure) | Dominated by outliers (local measure) |
| Stability | More stable in presence of noise | Can be unstable with outliers |
| Interpretation | “Average structural difference” | “Maximum local difference” |
| Computation | O(n³) via Hungarian algorithm | O(n^2.5) via min-cost flow |
When to use which? - Wasserstein: When all topological features matter; captures overall structural difference - Bottleneck: When largest discrepancy matters; robust to small noise but sensitive to big changes
7.2.1 Compute Both Distance Matrices
Computing Wasserstein distance matrix...
Computing Bottleneck distance matrix...
Distance matrix statistics:
Wasserstein - Min: 87.923, Max: 799.563, Mean: 296.694
Bottleneck - Min: 3.207, Max: 53.879, Mean: 24.31
7.2.2 Compare Distance Distributions
Correlation between distance metrics: 0.209
7.2.3 Classification Performance Comparison
=== Classification Accuracy Comparison ===
Wasserstein W₁: 44.0%
Bottleneck d∞: 20.0%
⇒ Wasserstein OUTPERFORMS Bottleneck
Global structure more informative than local extrema
7.2.4 Visualize Distance Matrices
7.2.5 Interpretation
=== Distance Metric Analysis ===
✗ Low correlation (r = 0.21)
Metrics capture fundamentally different structures
Choice significantly impacts conclusions
Recommendation:
→ Use Wasserstein distance for this dataset
Better classification performance
Captures overall structural differences relevant to drug effects
7.3 Euclidean Distance on Rich Stats
7.4 MDS Embeddings
8 Statistical Tests
Our statistical analysis follows a three-stage approach:
- Omnibus test (Kruskal-Wallis): Do ANY groups differ from each other?
- Pairwise comparisons (Permutation tests): WHICH drugs differ from control?
- Effect sizes (Cohen’s d): HOW MUCH do they differ?
This hierarchical approach controls false positives while providing interpretable effect magnitudes.
The Kruskal-Wallis test is a non-parametric alternative to one-way ANOVA. We use it here because:
- No normality assumption: Unlike ANOVA, it doesn’t require the data to follow a normal distribution — important for TDA features which may have unusual distributions
- Robust to outliers: Uses ranks instead of raw values, so extreme points don’t dominate
- Works with small samples: Reliable even with limited data per group
How to interpret the p-value:
- p < 0.05: Strong evidence that at least one group differs from the others (marked with *)
- p ≥ 0.05: Insufficient evidence to conclude groups differ
Why not use ANOVA? With small sample sizes and potentially non-normal distributions (common in TDA features), Kruskal-Wallis is more reliable and makes fewer assumptions.
8.1 Kruskal-Wallis Tests
Kruskal-Wallis Tests for Group Differences:
entropy: p = 0.004 *
n_features: p = 0.0495 *
max_persistence: p = 0.0123 *
total_persistence: p = 0.2011
A p-value tells you if groups differ statistically, but effect size tells you how much they differ in practical terms.
Cohen’s d measures the standardized difference between two group means:
\[d = \frac{\bar{x}_1 - \bar{x}_2}{s_{pooled}}\]
where \(s_{pooled}\) is the pooled standard deviation of both groups.
Interpretation guidelines:
| d | value | |
|---|---|---|
| < 0.2 | Negligible | Groups nearly identical |
| 0.2 – 0.5 | Small | Detectable but minor difference |
| 0.5 – 0.8 | Medium | Noticeable practical difference |
| > 0.8 | Large | Substantial, meaningful difference |
Why effect size matters: With large samples, even tiny differences can be “statistically significant” (p < 0.05) but practically meaningless. Effect size helps distinguish meaningful differences from trivial ones.
A permutation test is a non-parametric method to compute p-values without assuming any particular distribution:
- Calculate the observed difference between groups (e.g., difference in mean entropy)
- Randomly shuffle group labels many times (e.g., 10,000 permutations)
- Recalculate the difference after each shuffle
- Count how often the shuffled difference exceeds the observed difference
- p-value = (count + 1) / (n_permutations + 1)
Advantages:
- No distributional assumptions — works for any data
- Works with any test statistic
- Provides exact p-values even for small samples
- Intuitive interpretation: “how often would we see this difference by chance?”
Used here: We compare each drug group to CONTROL using permutation tests to get reliable p-values.
When we test multiple features across multiple drug groups, we increase the chance of false positives. With 4 drugs × 3 features = 12 tests at α = 0.05, we expect about 0.6 false positives by chance alone.
Recommendations for interpreting results:
- Focus on results with p < 0.01 (more stringent threshold)
- Prioritize findings with large effect sizes (|d| > 0.8)
- Look for consistent patterns across related features (e.g., both entropy and n_cycles showing similar direction)
Results that meet multiple criteria (low p-value AND large effect size AND consistent pattern) are most reliable.
8.2 Pairwise Drug Comparisons with Effect Sizes
| Row | drug | feature | diff_pct | cohens_d | effect_size | p_value |
|---|---|---|---|---|---|---|
| SubStrin… | String | Float64 | Float64 | String | Float64 | |
| 1 | CIPERMETRINA | entropy | -10.5 | -2.31 | large | 0.0077 |
| 2 | ENDOSULFAN | entropy | -9.5 | -1.76 | large | 0.0304 |
| 3 | GLIFOSATO | entropy | -15.3 | -2.77 | large | 0.0029 |
| 4 | SPINOSAD | entropy | -8.5 | -2.02 | large | 0.0158 |
| 5 | CIPERMETRINA | n_cycles | -33.4 | -1.63 | large | 0.0294 |
| 6 | ENDOSULFAN | n_cycles | -27.3 | -1.27 | large | 0.0616 |
| 7 | GLIFOSATO | n_cycles | -39.3 | -1.86 | large | 0.0174 |
| 8 | SPINOSAD | n_cycles | -27.7 | -1.39 | large | 0.0462 |
| 9 | CIPERMETRINA | max_persistence | 111.5 | 1.46 | large | 0.0613 |
| 10 | ENDOSULFAN | max_persistence | 146.7 | 1.64 | large | 0.0382 |
| 11 | GLIFOSATO | max_persistence | 265.4 | 3.16 | large | 0.0028 |
| 12 | SPINOSAD | max_persistence | 113.1 | 1.99 | large | 0.0202 |
Multiple comparison correction:
Number of tests: 12
Bonferroni-corrected α = 0.0042
Comparisons significant after correction: 2
9 Classification
Beyond hypothesis testing, we can ask: can we automatically identify which drug a spider was exposed to based on its web’s topological features? This is a classification task.
KNN is one of the simplest classification algorithms. To classify a new sample:
- Compute the distance from the new sample to all training samples
- Find the k nearest neighbors (k closest training samples)
- Assign the majority class among those neighbors
Key parameter: k (number of neighbors)
- Small k (e.g., k=1 or k=3): More sensitive to local patterns, but also to noise
- Large k (e.g., k=10+): More robust, but may miss subtle differences
We use k=3 as a balanced choice that captures local structure without being overly sensitive to outliers.
Distance metric matters: We test both Wasserstein distance (comparing persistence diagrams directly) and Euclidean distance (comparing extracted features).
The problem: If we train and test on the same data, we get overly optimistic accuracy because the model has “seen” the answers. We need to estimate performance on unseen data.
9.0.1 Leave-One-Out Cross-Validation (LOOCV)
- Remove one sample from the dataset
- Train the model on the remaining n-1 samples
- Predict the class of the held-out sample
- Repeat for every sample
- Accuracy = proportion of correct predictions
Pros: Uses maximum training data; deterministic (same result every time) Cons: Computationally expensive; can have high variance
9.0.2 K-Fold Cross-Validation
- Split data into k equal folds (e.g., k=5)
- For each fold: train on k-1 folds, test on the remaining fold
- Average accuracy across all folds
Pros: Good balance of bias and variance; faster than LOOCV Cons: Results vary slightly depending on random split (we report mean ± std)
9.0.3 Interpreting Results
- LOOCV accuracy: Single number, deterministic
- K-fold accuracy: Reported as mean ± standard deviation
- Higher accuracy = better classification; >50% for 5 classes means better than random guessing (20%)
Beyond overall accuracy, we report per-class metrics:
| Metric | What it measures | Formula |
|---|---|---|
| Precision | Of samples predicted as class X, how many are truly X? | TP / (TP + FP) |
| Recall | Of samples truly in class X, how many did we identify? | TP / (TP + FN) |
| F1 Score | Harmonic mean of precision and recall | 2 × (P × R) / (P + R) |
Interpreting the confusion matrix:
- Diagonal elements: Correct predictions (true positives for each class)
- Off-diagonal elements: Errors — reading row i, column j means “sample truly in class i was predicted as class j”
- A perfect classifier has all counts on the diagonal
9.1 KNN with Wasserstein Distance (LOOCV)
Accuracy (KNN Wasserstein k=3): 44.0%
Per-class metrics:
CIPERMETRINA: precision=0.0, recall=0.0, f1=0.0
CONTROL: precision=0.67, recall=0.8, f1=0.73
ENDOSULFAN: precision=0.33, recall=0.2, f1=0.25
GLIFOSATO: precision=0.67, recall=0.4, f1=0.5
SPINOSAD: precision=0.44, recall=0.8, f1=0.57
9.2 KNN on Vectorized Features (5-fold CV)
Accuracy (KNN vectorized, 5-fold): 28.0% +/- 22.8%
9.2.1 Confusion Matrix: Vectorized Features (LOOCV)
To understand which classes are confused, we run LOOCV to get predictions:
LOOCV Accuracy: 24.0%
Per-class metrics (Vectorized Features):
CIPERMETRINA: precision=0.0, recall=0.0, f1=0.0
CONTROL: precision=0.0, recall=0.0, f1=0.0
ENDOSULFAN: precision=0.6, recall=0.6, f1=0.6
GLIFOSATO: precision=0.33, recall=0.2, f1=0.25
SPINOSAD: precision=0.2, recall=0.4, f1=0.27
9.3 KNN on Rich Stats (5-fold CV)
Accuracy (KNN rich stats, 5-fold): 40.0% +/- 14.1%
9.3.1 Confusion Matrix: Rich Stats (LOOCV)
LOOCV Accuracy: 36.0%
Per-class metrics (Rich Stats):
CIPERMETRINA: precision=0.0, recall=0.0, f1=0.0
CONTROL: precision=0.67, recall=0.8, f1=0.73
ENDOSULFAN: precision=0.5, recall=0.4, f1=0.44
GLIFOSATO: precision=0.67, recall=0.4, f1=0.5
SPINOSAD: precision=0.17, recall=0.2, f1=0.18
9.4 Classification Comparison
=== Classification Methods Comparison ===
1. KNN Wasserstein (k=3, LOOCV): 44.0%
2. KNN Vectorized Features (k=3, 5-fold): 28.0% +/- 22.8%
3. KNN Rich Stats (k=3, 5-fold): 40.0% +/- 14.1%
10 Method Comparison: TDA vs Traditional Approaches
To validate that TDA provides unique value beyond mathematical sophistication, we compare against traditional image analysis methods.
10.1 Why Compare Methods?
This comparison answers a critical question for expert reviewers: “Why use TDA when simpler methods might work?”
We test three alternative approaches: 1. PCA on raw pixels: Dimensionality reduction on flattened images 2. Handcrafted features: Domain-informed image statistics 3. TDA methods: Our topological approach (for reference)
- If TDA outperforms: Topological structure captures information that pixel-level methods miss
- If alternatives perform similarly: TDA may be unnecessarily complex for this problem
- If PCA dominates: Raw pixel patterns sufficient; topology adds little value
This provides empirical justification (or refutation) of the TDA methodology choice.
10.2 Alternative Feature Extraction
10.2.1 Method 1: PCA on Raw Pixels
Extracting raw pixel features...
Raw pixel matrix size: (25, 10000)
(25 samples × 10000 pixels)
Applying PCA...
PCA with 20 components:
Variance explained: 90.0%
Reduced dimensions: 20
10.2.2 Method 2: Handcrafted Image Features
Extracting handcrafted features...
Handcrafted feature matrix size: (25, 8)
Features: ["mean_intensity", "std_intensity", "max_intensity", "edge_strength", "center_dist", "spread_y", "spread_x", "density"]
0-element view(::Vector{Float64}, Int64[]) with eltype Float64
10.3 Classification Performance Comparison
=== Method Comparison Results ===
Method | Features | Accuracy (Mean ± SD) | Range
--------------------------------------------------------------------------------
Handcrafted Features | 8 | 48.0% ± 11.0% | [40.0%, 60.0%]
TDA: Rich Stats (H1) | 12 | 40.0% ± 14.1% | [20.0%, 60.0%]
TDA: Vectorized Diagram | 362 | 28.0% ± 22.8% | [0.0%, 60.0%]
PCA on Pixels (20 comp) | 20 | 16.0% ± 8.9% | [0.0%, 20.0%]
10.4 Interpretation
=== Key Findings ===
Best TDA method: TDA: Rich Stats (H1)
Accuracy: 40.0% ± 14.1%
Best alternative: Handcrafted Features
Accuracy: 48.0% ± 11.0%
TDA advantage: -8.0 percentage points
⚠ Traditional methods OUTPERFORM TDA
Consider simpler approaches for this problem
10.5 Strengths and Weaknesses of Each Approach
| Method | Strengths | Weaknesses | Best Use Case |
|---|---|---|---|
| TDA Rich Stats | • Interpretable features • Topologically invariant • Few features (low overfitting) |
• Loses spatial info • Requires TDA expertise |
Small samples, need interpretability |
| TDA Vectorized | • Captures full diagram • Multi-scale information |
• High-dimensional • Harder to interpret |
Large samples, complex structure |
| PCA on Pixels | • Simple baseline • No domain knowledge needed |
• Sensitive to rotation/translation • High-dimensional input |
Quick baseline, large datasets |
| Handcrafted Features | • Fast computation • Domain-informed |
• Requires expert feature engineering • May miss subtle patterns |
When domain knowledge available |
Comparing methods demonstrates that:
- TDA is not arbitrary: Empirical evidence shows whether topology adds value
- Interpretability vs accuracy trade-off: TDA rich stats offer interpretable features with competitive accuracy
- Small sample robustness: With N=25, lower-dimensional TDA features (12 dims) may generalize better than high-dimensional pixel features (10,000 dims → 20 PCA components)
This comparison strengthens the methodological contribution by showing TDA provides unique value rather than just mathematical sophistication.
11 Feature Importance
| Row | feature | importance |
|---|---|---|
| Symbol | Float64 | |
| 1 | birth_range | 0.098 |
| 2 | std_persistence | 0.082 |
| 3 | q75 | 0.074 |
| 4 | q90 | 0.054 |
| 5 | entropy | 0.05 |
| 6 | median_birth | 0.04 |
| 7 | n_features | 0.022 |
| 8 | total_persistence | 0.022 |
| 9 | q25 | 0.008 |
| 10 | max_persistence | -0.0 |
| 11 | median_persistence | -0.022 |
| 12 | q50 | -0.044 |
12 Biological Interpretation
12.1 Feature Meaning
| Dimension | Web Structure | Interpretation |
|---|---|---|
| H0 (components) | Disconnected fragments | More H0 = broken/fragmented web |
| H1 (loops/cycles) | Closed cells/meshes | More H1 = more closed cells |
| Entropy H1 | Cell uniformity | High entropy = regular cells |
| Max persistence H1 | Largest hole/gap | High = large gap in web |
12.2 Drug Effects Summary
Drug Effects Compared to CONTROL:
CONTROL baseline - Entropy: 4.5, H1 count: 97.6
CIPERMETRINA:
Entropy: 4.027 (-10.5%)
H1 count: 65.0 (-33.4%)
Effect: Fewer closed cells, More irregular cells
ENDOSULFAN:
Entropy: 4.072 (-9.5%)
H1 count: 71.0 (-27.3%)
Effect: Fewer closed cells
GLIFOSATO:
Entropy: 3.813 (-15.3%)
H1 count: 59.2 (-39.3%)
Effect: Fewer closed cells, More irregular cells
SPINOSAD:
Entropy: 4.117 (-8.5%)
H1 count: 70.6 (-27.7%)
Effect: Fewer closed cells
13 Enhanced Separability Analysis
This section provides rigorous statistical evidence for two key hypotheses:
- CONTROL is clearly separable from all drug-treated groups
- Drug classes are NOT easily separable from each other
13.1 Distance Combination
We combine Wasserstein distance (topological structure) with Euclidean distance (rich statistics features) to potentially improve classification.
=== Distance Combination Optimization ===
Best alpha: 0.5
Best accuracy: 44.0%
Interpretation:
alpha = 1.0 means pure Wasserstein distance
alpha = 0.0 means pure Euclidean (rich stats) distance
=== Classification Accuracy Comparison ===
Wasserstein only: 44.0%
Euclidean only: 36.0%
Combined (α=0.5): 44.0%
13.2 Binary Classification: Control vs Drug
Collapsing all drugs into a single “DRUG” class tests whether CONTROL can be clearly distinguished from treated webs.
=== Binary Classification: CONTROL vs DRUG ===
Accuracy: 88.0%
95% CI: [75.9%, 100.0%]
Sensitivity (Control recall): 80.0%
Specificity (Drug recall): 90.0%
13.2.1 ROC Curve Analysis
The ROC curve shows how well we can detect CONTROL samples using distance to the Control centroid.
ROC AUC: 0.955
Interpretation:
AUC > 0.9: Excellent discrimination
AUC 0.8-0.9: Good discrimination
AUC 0.7-0.8: Fair discrimination
13.3 Separability Metrics
13.3.1 Within-Class vs Between-Class Distance Ratios
A lower ratio indicates better class separation. Ratios above 0.8 suggest overlapping classes.
=== Within/Between Distance Ratios ===
Full 5-class: 0.739 - moderately separated
Binary (Ctrl/Drug): 0.607 - moderately separated
Drugs only (4-class): 0.839 - overlapping
13.3.2 Silhouette Score Analysis
Silhouette scores measure how well-defined each cluster is. Higher is better: - > 0.5: Good separation - 0.25-0.5: Weak separation - < 0.25: Poor separation (overlapping)
=== Silhouette Scores by Class ===
Overall mean: -0.016
SPINOSAD: 0.024 (poor)
CIPERMETRINA: -0.027 (poor)
CONTROL: -0.013 (poor)
ENDOSULFAN: 0.001 (poor)
GLIFOSATO: -0.067 (poor)
13.3.3 Pairwise Group Distances
| Row | group1 | group2 | mean_distance | std_distance | n_pairs |
|---|---|---|---|---|---|
| String | String | Float64 | Float64 | Int64 | |
| 1 | CIPERMETRINA | CIPERMETRINA | 191.607 | 59.6837 | 20 |
| 2 | CIPERMETRINA | CONTROL | 403.084 | 135.293 | 25 |
| 3 | CIPERMETRINA | ENDOSULFAN | 244.223 | 61.5255 | 25 |
| 4 | CIPERMETRINA | GLIFOSATO | 264.904 | 90.2555 | 25 |
| 5 | CIPERMETRINA | SPINOSAD | 186.761 | 60.8101 | 25 |
| 6 | CONTROL | CONTROL | 322.777 | 202.577 | 20 |
| 7 | CONTROL | ENDOSULFAN | 324.871 | 164.328 | 25 |
| 8 | CONTROL | GLIFOSATO | 537.583 | 136.117 | 25 |
| 9 | CONTROL | SPINOSAD | 409.455 | 147.488 | 25 |
| 10 | ENDOSULFAN | ENDOSULFAN | 223.484 | 37.978 | 20 |
| 11 | ENDOSULFAN | GLIFOSATO | 325.272 | 88.6281 | 25 |
| 12 | ENDOSULFAN | SPINOSAD | 253.009 | 49.6141 | 25 |
| 13 | GLIFOSATO | GLIFOSATO | 276.809 | 62.9035 | 20 |
| 14 | GLIFOSATO | SPINOSAD | 282.145 | 90.0662 | 25 |
| 15 | SPINOSAD | SPINOSAD | 178.751 | 41.1936 | 20 |
13.4 PERMANOVA Tests
PERMANOVA tests whether group centroids differ significantly in multivariate space. It works directly on the Wasserstein distance matrix.
13.4.1 Control vs Drugs
=== PERMANOVA: Control vs Drugs ===
Pseudo-F: 10.81
p-value: 0.0001
✓ CONTROL centroid significantly differs from DRUG centroid (p < 0.05)
13.4.2 Drug Equivalence Test
Testing whether drug groups differ from each other (excluding CONTROL).
=== PERMANOVA: Among Drugs Only ===
Pseudo-F: 3.25
p-value: 0.0001
Interpretation: Some drug differences detected
13.4.3 Pairwise Drug Comparisons
Testing each pair of drugs to see if they can be statistically distinguished.
=== Pairwise Drug Permutation Tests (Entropy) ===
| Row | drug1 | drug2 | mean_diff | p_value | significant | interpretation |
|---|---|---|---|---|---|---|
| String | String | Float64 | Float64 | Bool | String | |
| 1 | CIPERMETRINA | ENDOSULFAN | 0.0450259 | 0.70443 | false | NOT distinguishable |
| 2 | CIPERMETRINA | GLIFOSATO | 0.214446 | 0.060494 | false | NOT distinguishable |
| 3 | CIPERMETRINA | SPINOSAD | 0.089826 | 0.19618 | false | NOT distinguishable |
| 4 | ENDOSULFAN | GLIFOSATO | 0.259471 | 0.105689 | false | NOT distinguishable |
| 5 | ENDOSULFAN | SPINOSAD | 0.0448001 | 0.716228 | false | NOT distinguishable |
| 6 | GLIFOSATO | SPINOSAD | 0.304272 | 0.00939906 | true | distinguishable |
13.5 Confusion Analysis
Which classes are most often confused with each other?
=== Top Confusion Pairs ===
| Row | true_class | predicted_class | confusion_rate | count |
|---|---|---|---|---|
| String | String | Float64 | Int64 | |
| 1 | CIPERMETRINA | SPINOSAD | 60.0 | 3 |
| 2 | ENDOSULFAN | CIPERMETRINA | 40.0 | 2 |
| 3 | ENDOSULFAN | CONTROL | 40.0 | 2 |
| 4 | GLIFOSATO | SPINOSAD | 40.0 | 2 |
| 5 | CIPERMETRINA | ENDOSULFAN | 20.0 | 1 |
| 6 | CIPERMETRINA | GLIFOSATO | 20.0 | 1 |
| 7 | CONTROL | ENDOSULFAN | 20.0 | 1 |
| 8 | GLIFOSATO | CIPERMETRINA | 20.0 | 1 |
| 9 | SPINOSAD | CIPERMETRINA | 20.0 | 1 |
| 10 | CIPERMETRINA | CONTROL | 0.0 | 0 |
13.6 Summary: Separability Evidence
============================================================
SEPARABILITY ANALYSIS SUMMARY
============================================================
### Evidence that CONTROL is SEPARABLE ###
Binary classification accuracy: 88.0%
ROC AUC: 0.955
PERMANOVA (Ctrl vs Drugs) p-value: 0.0001
Control silhouette score: -0.013
Conclusion: ✓ CONTROL IS CLEARLY SEPARABLE
### Evidence that DRUGS are NOT separable ###
Drug-only PERMANOVA p-value: 0.0001
Drugs-only within/between ratio: 0.839
Mean drug silhouette: -0.017
Conclusion: ✓ DRUGS ARE NOT EASILY SEPARABLE
============================================================
14 Limitations and Future Directions
14.1 Methodological Limitations
14.1.1 Sample Size
N=5 per group is insufficient for robust statistical inference
- Effect size estimates have very wide confidence intervals
- High risk of Type II error (missing true effects)
- Classification accuracy likely overestimated due to overfitting
- Permutation tests have limited precision with small sample sizes
Impact: Results should be viewed as exploratory and hypothesis-generating, not confirmatory.
14.1.2 Parameter Selection
All preprocessing and TDA parameters were chosen heuristically without systematic optimization:
| Parameter | Value Used | Justification |
|---|---|---|
| blur | 2 | Heuristic choice (not optimized) |
| threshold | 0.1 | Visual inspection (not data-driven) |
| sample_size | 1000 points | Computational convenience (not validated) |
| rips_cutoff | 5 | Arbitrary choice (no sensitivity analysis shown) |
| k (KNN) | 3 | Standard default (not tuned) |
| Wasserstein | (p=1, q=2) | Not compared to alternatives |
Impact: Results may be sensitive to these choices. A systematic sensitivity analysis would strengthen conclusions (recommended for future work).
14.1.3 Validation Strategy
No independent validation dataset
- All reported accuracies use cross-validation on the same 25 samples
- High risk of overfitting to sample-specific patterns
- True generalization performance likely lower than reported
Recommended validation hierarchy (for future studies): 1. Level 1: Internal validation (current LOOCV) 2. Level 2: Temporal validation (same cohort, different timepoints) 3. Level 3: External validation (independent lab, different spiders)
14.2 Biological Limitations
14.2.1 Uncontrolled Confounders
The original dataset lacks metadata for critical experimental variables:
- Spider biology: Species, age, sex, size not recorded
- Drug protocol: Dosages, exposure duration, administration method unknown
- Environmental conditions: Temperature, humidity, light not controlled
- Web collection: Time post-exposure unclear; web completeness varies
Impact: Observed differences could reflect confounding variables rather than drug effects alone.
14.2.2 Mechanism Unclear
TDA detects structural differences but doesn’t explain why:
- H1 features (loops/cells) may reflect motor control, silk production, cognitive effects, or combinations
- Different drugs with different mechanisms show similar topological signatures
- Requires neurobiology/toxicology expertise for causal interpretation
14.2.3 Generalizability Unknown
- Results specific to one spider species (unidentified in dataset)
- Drug effects may vary across species, life stages, dosages
- Environmental context (lab vs. field) not specified
- Replication on independent datasets essential
14.3 Statistical Concerns
14.3.1 Multiple Testing
- 12+ statistical tests conducted without strict family-wise error rate control
- Bonferroni correction mentioned (α = 0.004) but not consistently applied
- With small N, correction further reduces power
Approach taken: Report raw p-values; prioritize effect sizes and consistency across tests
14.3.2 Cross-Validation Variance
- LOOCV on N=25 has high variance
- K-fold CV (k=5) results show wide standard deviations
- Single train/test split would be even less reliable given small N
14.3.3 Distance Metric Choice
- Wasserstein distance chosen arbitrarily
- No comparison to Bottleneck distance or other metrics
- Different metrics may yield different classification results
14.4 What This Study IS and IS NOT
14.4.1 ✓ What This Study IS
- Proof-of-concept demonstrating TDA’s applicability to toxicological screening
- Methodological contribution showing topological features capture web structure
- Hypothesis-generating exploratory analysis identifying promising features
- Reproducible pipeline with documented code and methods
14.4.2 ✗ What This Study is NOT
- NOT definitive biological conclusions about drug effects
- NOT generalizable beyond this specific dataset
- NOT adequately powered for detecting small-to-medium effects
- NOT validated on independent data
14.5 Strengths Despite Limitations
Despite small sample size and methodological constraints, this work demonstrates:
- Novel application: First TDA analysis of drug-induced spider webs
- Clear CONTROL separation: Binary classification (88% accuracy) suggests drugs do affect web topology
- Reproducible methods: All code and analysis fully documented
- Transparent limitations: Honest acknowledgment of constraints builds trust
14.6 Future Directions
14.6.1 Immediate Improvements (Next Study)
- Increase sample size: Target N ≥ 20 per group for adequate power
- Independent validation: Collect hold-out test set (70/30 train/test split)
- Systematic parameter tuning: Grid search with nested cross-validation
- Controlled experiments: Standardize spider species, age, drug dosage, exposure time
14.6.2 Advanced Methodology
- Comparison to baselines: Test against CNN classifiers and traditional image features
- Multi-scale analysis: Vary filtration parameters systematically
- Statistical topology methods: Use recent advances for inference on persistence diagrams
- Temporal dynamics: If possible, track same spiders building multiple webs
14.6.3 Biological Integration
- Dose-response curves: Test multiple drug concentrations
- Behavioral correlates: Link topological features to specific motor/cognitive deficits
- Multi-species comparison: Test generalizability across spider families
- Mechanism investigation: Use neurobiology techniques to validate TDA findings
14.7 Conclusions with Appropriate Caveats
This exploratory analysis demonstrates that:
- TDA captures drug effects: CONTROL webs are topologically distinguishable from drug-treated webs
- H1 features are informative: Entropy, cycle count, and persistence metrics vary systematically
- Drug classes overlap: Different drugs produce similar topological signatures, suggesting common disruption pathways
- Method shows promise: TDA provides interpretable, geometrically-motivated features
However, with N=5 per group and no independent validation, these findings require: - Replication on larger, independent datasets - Systematic parameter validation - Controlled experimental conditions - Biological mechanism investigation
This work is best viewed as methodological proof-of-concept rather than definitive toxicological findings.
Citation
@online{vituri_f._pinto2026,
author = {Vituri F. Pinto, Guilherme and , Telmo and , ??????},
title = {Classifying Drug-Induced Webs Using {Topological} {Data}
{Analysis}},
date = {2026-02-06},
langid = {en},
abstract = {We studied etc etc etc}
}